-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various MTSAC bug fixes #1975
Various MTSAC bug fixes #1975
Conversation
@avnishn Thanks Avnishn! I think I forget to read contribution guidelines when submit the PR, sorry for making so much troubles to you. |
Codecov Report
@@ Coverage Diff @@
## master #1975 +/- ##
==========================================
- Coverage 93.50% 93.40% -0.11%
==========================================
Files 192 192
Lines 10182 10184 +2
Branches 1268 1269 +1
==========================================
- Hits 9521 9512 -9
- Misses 438 446 +8
- Partials 223 226 +3
Continue to review full report at Codecov.
|
env_spec=ml1_train_envs.spec, | ||
num_tasks=50, | ||
steps_per_epoch=epoch_cycles, | ||
replay_buffer=replay_buffer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please only call args as args and kwargs as kwargs.
'The correct number of tasks?') | ||
obs = torch.Tensor([env.reset()[0]] * buffer_batch_size) | ||
with pytest.raises(ValueError, match=error_string): | ||
mtsac._get_log_alpha(dict(observation=obs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to test a private method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, however, the tests are in place to verify the correctness of this implementation. I feel more comfortable having the tests than not. At the same time, it makes no sense to have this as a publicly exposed field because it has no use outside of the algorithm.
Please reference the issues you're fixing |
runner.setup(algo=mtsac, | ||
env=mt10_train_envs, | ||
sampler_cls=LocalSampler, | ||
n_workers=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why limit the number of workers here, and can't we use ray?
@maliesa96 we can't. Tldr; using the ray sampler with the old metaworld envs uses more memory than we have on lab machines. |
Damn alright, LGTM then. |
fixes to Examples to use the correct num_tasks fixes to max_episode_length_eval being used by the algorithm Co-authored-by: Tianhong Dai <tianhongdai914@gmail.com>
e191183
to
c83d34e
Compare
Backport #1905, #1975, #1908 to fix problems with max_eval_path_length being not used by mtsac and sac, and add checking for incorrect num_tasks being set in mtsac. Timelimit.truncated modified only when necessary This issue occurs when there are multiple garage envs that are nested or timelimit truncated = False is included in the environment keys. Previously, our timelimit truncated logic was written with the idea in mind that the key was only added when a time limit truncation occured. If an environment already has timelimit truncated = False in its keys then the previous behavior was to set Done = True which is the incorrect behavior. That was causing performance degradation in MTSAC and MTPPO/TRPO. Now Done is only true in the normal/trivial case, never if timelimit truncated is False.
Backport #1905, #1975, #1908 to fix problems with max_eval_path_length being not used by mtsac and sac, and add checking for incorrect num_tasks being set in mtsac. Timelimit.truncated modified only when necessary This issue occurs when there are multiple garage envs that are nested or timelimit truncated = False is included in the environment keys. Previously, our timelimit truncated logic was written with the idea in mind that the key was only added when a time limit truncation occured. If an environment already has timelimit truncated = False in its keys then the previous behavior was to set Done = True which is the incorrect behavior. That was causing performance degradation in MTSAC and MTPPO/TRPO. Now Done is only true in the normal/trivial case, never if timelimit truncated is False.
Backport #1905, #1975, #1908 to fix problems with max_eval_path_length being not used by mtsac and sac, and add checking for incorrect num_tasks being set in mtsac. Timelimit.truncated modified only when necessary This issue occurs when there are multiple garage envs that are nested or timelimit truncated = False is included in the environment keys. Previously, our timelimit truncated logic was written with the idea in mind that the key was only added when a time limit truncation occured. If an environment already has timelimit truncated = False in its keys then the previous behavior was to set Done = True which is the incorrect behavior. That was causing performance degradation in MTSAC and MTPPO/TRPO. Now Done is only true in the normal/trivial case, never if timelimit truncated is False.
Backport #1905, #1975, #1908 to fix problems with max_eval_path_length being not used by mtsac and sac, and add checking for incorrect num_tasks being set in mtsac. Timelimit.truncated modified only when necessary This issue occurs when there are multiple garage envs that are nested or timelimit truncated = False is included in the environment keys. Previously, our timelimit truncated logic was written with the idea in mind that the key was only added when a time limit truncation occured. If an environment already has timelimit truncated = False in its keys then the previous behavior was to set Done = True which is the incorrect behavior. That was causing performance degradation in MTSAC and MTPPO/TRPO. Now Done is only true in the normal/trivial case, never if timelimit truncated is False.
fixes to Examples to use the correct num_tasks
fixes to max_episode_length_eval being used by the algorithm
Co-authored-by: Tianhong Dai tianhongdai914@gmail.com